Geothermal machine learning analysis: Southwest New Mexico

This notebook is a part of the GTcloud.jl: GeoThermal Cloud for Machine Learning.

geothermalcloud

Machine learning analyses are performed using the SmartTensors machine learning framework.

SmartTensors

This notebook demonstrates how the NMFk module of SmartTensors can be applied to perform unsupervised geothermal machine-learning analyses.

nmfk

More information how the ML results are interpreted to provide geothermal insights is discussed in our research paper.

Import required libraries for this work

If NMFk is not installed, first execute in the Julia REPL import Pkg; Pkg.add("NMFk"); Pkg.add("DelimitedFiles"); Pkg.add("JLD"); Pkg.add("Gadfly"); Pkg.add("Cairo"); Pkg.add("Fontconfig"); Pkg.add("Mads").

Load and pre-process the data

Setup the working directory containing the SWNM data

Load the data file

Define names of the data attributes (matrix columns)

Short attribute names are used for coding.

Long attribute names are used for plotting and visualization.

Define attributes to remove from analysis

Define attributes for analysis

Define names of the data locations

Short location names are used for coding.

Long location names are used for plotting and visualization.

Define location coordinates

Set up directories tp store results and figures

Define a range for number of signatures to be explored

Define and normalize the data matrix

Perform ML analyses

The NMFk algorithm factorizes the normalized data matrix Xu into W and H matrices. For more information, check out the NMFk website

Here, the NMFk results are loaded from a prior ML runs.

As seen from the output above, the NMFk analyses identified that the optimal number of geothermal signatures in the dataset 8.

Solutions with a number of signatures less than 8 are underfitting.

Solutions with a number of signatures greater than 8 are overfitting and unacceptable.

The set of accetable solutions are defined as follows:

The accceptable solutions contain 2, 3, 4, 5 and 8 signatures.

Post-process NMFk results

Number of signatures

Plot representing solution quality (fit) and silhouette width (robustness) for different number of sigantures k:

The plot above also demonstrates that the accceptable solutions contain 2, 3, 4, 5 and 8 signatures. Note, any solution is accepted if the robustness >0.25.

Analysis of all the accceptable solutions

The ML solutions containing an acceptable number of signatures are further analyzed as follows:

Analysis of the 5-signature solution

The results for a solution with 5 signatures presented above will be further discussed here.

The geothermal attributes are clustered into 5 groups:

This grouping is based on analyses of the attribute matrix W:

attributes-3-labeled-sorted

The well locations are also clustered into 5 groups:

This grouping is based on analyses of the location matrix H:

locations-4-labeled-sorted

The map ../figures-case01/locations-5-map.html provides interacive visualization of the extracted location groups (the html file can be also openned with any browswer).

Comparison of the ML solutions against the SWNM physiographic provinces

Spatial association of the extracted signatures with the four physiographic provinces in SWNM is summarized here:

signatures

The solutions for k=2, 3, and 4 provide a higher-level generalization of the geothermal signatures while the k=8 solution allow us to further refine the geothermal signatures. The solution for k=5 provides appropriate generalization for the data used. Clearly, the ML algorithm was able to blindly indentify the physiographic provinces associated with analyzed hydrogeothermal systems without providing any information about their location (coordinates).

Description of signatures for spatial associations

Contribution of each location on each signatures. This plot shows signatures of accepted solution together. From left to right, numbers of signature increased. This shows how the signatures progress if number of signature increases.

Ws

Description of attribute matrices

The plot below shows attribute matrices of all accepted solutions. Numbers of signature increases from left to right. This plot shows how each attribute contribute to each signature. Also, this plot shows how the signature progress as the numbers of signature increase.

Hs

Optimal signatures

The figure below shows the map of the optimal signatures. The k=5 solution best characterizes the spatial associations and geothermal attributes of the SWNM.

Ws

Signatures and corresponding resource type, dominant attributes, physical significance and physiographic provinces

Signature Resource type Dominant attributes Physical Significance Physiographci province
A Low temperarutre

Gravity anomaly
Magnetic intensity
Volcanic dike density
Drainage density
Li+ concentration

Shallow heat flow Southern MDVF
B Medium temperarutre

B+ and Li+ concentrations
Gravity anomaly
Magnetic intensity
Quaternary fault density
Silica geothermometer
Heat flow
Depth to the basement

Deep heat flow Southern Rio Grande rift
C Low temperarutre

B+ and Li+ concentrations
Magnetic intensity
Drainage density
Crustal thickness

Deep heat source Colorado Plateau
D Low temperarutre

Drainage density
Fault intersection density
Seismicity
State map fault density
Spring density
Hydraulic gradient

Tectonics Northern Rio Grande rift
E Medium temperarutre

Drainage density
State map fault density
Precipitation
Silica geothermometer
Hydraulic gradient

Vertical hydraulics Northern MDVF

Geothermal resource assessment

Medium-temperarutre hydrothermal systems
Low-temperarutre hydrothermal systems

For more details, please look at our paper titled: "Discovering Hidden Geothermal Signatures using Unsupervised Machine Learning."